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Abstract — This paper presents results pertaining to sequential 
methods for support recovery of sparse signals in noise. Specifi- 
cally, we show that any sequential measurement procedure fails 
provided the average number of measurements per dimension 
grows slower then D(/o||/i) _1 logs where s is the level of 
sparsity, and D(/o| |/i) the Kullback-Leibler divergence between 
the underlying distributions. For comparison, we show any non- 
sequential procedure fails provided the number of measurements 
grows at a rate less than | |/o) _1 log n, where n is the total di- 
mension of the problem. Lastly, we show that a simple procedure 
termed sequential thresholding guarantees exact support recovery 
provided the average number of measurements per dimension 
grows faster than D (/o 1 1 /i ) ~ 1 (log s + log log n) , a mere additive 
factor more than the lower bound. 

I. Introduction 

High-dimensional signal support recovery is a fundamental 
problem arising in many aspects of science and engineering. 
The goal of the basic problem is to determine, based on noisy 
observations, a sparse set of elements that somehow differ 
from the others. 

In this paper we study the following problem. Consider a 



support set S C {1, . 
that 

Vi,3 ~ 



, n} and a random variable yi j such 



/o(-) 
A(-) 



i G S 



(1) 



where /o(-) and /i(-) are probability measures on y, and j 
indexes multiple independent measurements of any component 
i € {l,...,n}. The dimension of the problem, n, is large - 
perhaps thousands or millions or more - but the support set S 
is sparse in the sense that the number of elements following 
/i is much less than the dimension, i.e., \S\ = s <C n. The 
goal of the sparse recovery problem is to identify the set S. 

In a non-sequential setting m > 1 independent observations 
of each component are made (Ui,i, ■■■,yi, m are observed for 
each i) and the fundamental limits of reliable recovery are 
readily characterized in terms of Kullback-Leibler divergence 
and dimension. 

Sequential approaches to the high dimensional support 
recovery problem have been given much attention recently (see 
ID, ED, 0, El, etc). In the sequential setting, the decision to 
observe y^.j is based on prior observations, i.e. j/^i, ytj-i- 
Herein lies the advantage of sequential methods: if prior 
measurements indicate a particular component belongs (or 



doesn't belong) to S with sufficient certainty, measurement 
of that component can cease, and resources can be diverted to 
a more uncertain element. 

The results presented in this paper are in terms of asymptotic 
rate at which the average number of measurements per dimen- 
sion, denoted m, must increase with n to ensure exact recovery 
of S for any fixed distributions /o and f\. The main contribu- 
tions are 7 J to present a necessary condition for success of any 
sequential procedure in the sparse setting, 2) show success of a 
simple sequential procedure first presented in |]2] is guaranteed 
provided the average number of measurements per dimension 
is within a small additive factor of the necessary condition for 
any sequential procedure, and compare this procedure to the 
known optimal sequential probability ratio test, and 3) lastly, 
compare these results to the performance limits of any non- 
sequential procedure. Table U summarizes these results. 

TABLE I 

Average number of measurements per dimension for exact 



RECOVERY 


Non-sequential 


m log n 
- D(/lll/o) 


necessary 


Sequential 


log S 

m - Woll/l) 


necessary 


Sequential Thresholding 


-v, l°g S 1 

171 > D(/oll/l) + 
log log n 


sufficient if s 
sub-linear in n 




D(fa\\fi) 





Our results are striking primarily for two reasons. First, 
sequential procedures succeed when the number of measure- 
ments per dimension increases at a rate logarithmic in the level 
of sparsity, i.e. logs. In contrast, non-sequential procedures 
require the average number of measurements per dimension 
to increase at a rate logarithmic in the dimension, i.e. logn. 
For signals where sparsity is sublinear in dimension, the gains 
of sequential methods are polynomial; in scenarios where the 
sparsity grows logarithmically, the gains are exponential. 

Secondly, and perhaps equally as surprising, a simple pro- 
cedure dubbed sequential thresholding achieves nearly optimal 
performance provided minor constraints on the level of spar- 
sity are met (specifically, that s is sublinear in n). In terms 
of the average number of measurements per dimension, the 
procedure comes within an additive factor, doubly logarithmic 
in dimension, of the lower bound of any sequential procedure. 



II. Problem Formulation 

Let S be a sparse subset of {1, n} with cardinality s — 
\S\. For any index i g {1, ...,n}, the random variable is 
independent, identically distributed according to (|T). That is, 
for all j, yi j follows distribution /i(-) if i belongs to S, and 
follows /o( 4 ) otherwise. We refer to f as the null distribution, 
and fx the alternative. 

In this paper, we limit our analysis to exact recovery of S 
using coordinate wise methods. Defining S as an estimate of 
S, the family wise error rate is given as: 



P(£) = P(5 ^ S) = P (J Si U (J 



where £j, i ^ 5 is a false positive error event and £j, i E S 
a false negative error event. To simplify notation, we define 
the false positive and false negative probabilities in the usual 
manner: a = F(Si\i <£ S), and R = P(&|i € S). 

The test to decide if component i belongs to S is based on 
the normalized log-likelihood ratio. For yj distributed i.i.d. / 
or fi, 



1 m 

t im) := ly hg 

711 < J 



fo(Vj) 



which is a function of (j/i, ...,y m ) € ]V m . The superscript m 
explicitly indicates the number of measurements used to form 
the likelihood ratio and is suppressed when unambiguous. The 
log-likelihood ratio is compared against a scalar threshold 7 
to hypothesize if a component follows fo or f\\ 

Additionally, the Kullback-Liebler divergence of distribu- 
tion fo from /1 is defined as: 



£>(/i||/o)=Ei 



log 



My) 



My) 



where Ei [•] is expectation with respect to distribution f\. 

A. Measurement procedures 

To be precise in characterizing a measurement procedure, 
we continue with three definitions. 

Definition ELI. Measurement Procedure. A procedure, de- 
noted 7T, used to determine if ytj is measured. If TTij = 1, 
then ytj is measured. Conversely, if TTjj 
measured. 



0, then yij is not 



Definition II.2. Non-sequential measurement procedure. Any 

measurement procedure ir such that 7Tjj is not a function of 
y it j> for any j'. 

Definition II.3. Sequential measurement procedure. A mea- 
surement procedure tt in which m .j is allowed to depend on 
prior measurements, specifically, Tti j : {y, 1, j—l} l— 
{0,1}. 



B. Measurement Budget 

In order to make fair comparison between measurement 
schemes, we limit the total number of observations of yij 
in expectation. For any procedure tt, we require 



E 



E 



7Ti, 



< nm 



(2) 




for some integer m. This implies, on average, we use m or 
fewer observations per dimension. 

III. Sequential Thresholding 

Sequential thresholding, first presented in [2|, relies on 
a simple bisection idea. The procedure consists of a series 
of K measurement passes, where each pass eliminates from 
consideration a proportion of the components measured on 
the prior pass. After the last pass the procedure terminates 
and the remaining components are taken as the estimate 
of S. Sequential thresholding is described in the following 
algorithm. 

Sequential Thresholding 

input: K « log n steps, threshold 7 
initialize: 7^1 = 1 for all i 
for k = 1 , . . . , K do 
for {i : TT it k = 1} do 
measure: t< 

threshold: n 

end for 
end for 

output: S = {i : Tti,K+i = 1} 



A. Example of Sequential Thresholding 

Sequential thresholding is perhaps best illustrated by exam- 
ple. Consider a simple case with measurement budget m = 2, 
fo - Af(Q, 1) and f Y - Af(6, 1) for some 9 > 0. 

On the first pass, the procedures measures t/^i for all i, 
using n measurements (half of the total budget as ran = 2n). 
On subsequent passes, the procedure observes y^k if 7Tj = 1. 
To set Hi,k+i the procedure thresholds observations that fall 
below, for example, 7 = 0, eliminating a proportion (approx- 
imately half in this case) of components following the null 
distribution: 

/ 1 y i , fe >0 

^ y. k . k < 0. 

In words, if a measurement of component i falls below the 
threshold on any pass, that component is not measured for the 
remainder of the procedure, and not included in the estimate 
of S. After K w logn passes, the procedure terminates, and 
estimates S as the set of indices that have not been eliminated 
from consideration: S = {i : TTi t K+i = !}■ 



B. Details of Sequential Thresholding 

Sequential thresholding requires two inputs: 1) K, the 
number of passes, and 2) 7, a threshold. We define p as the 
probability a component following the null is eliminated on 
any given pass, which is related to the threshold as 



Apm) 



< -y\i#S) = p. 



Additionally, we restrict our consideration to p G [1/2, 1) - 
that is, the probability a null component is eliminated on a 
given pass is one half or greater. 

On each pass, pm (which we assume to be an integer) 
measurements of a subset of components are made, and the 
log-likelihood ratio t\ pm ^ is formed for each component. 
As measurements are made in blocks of size pm, we use 
boldface 7^/. to indicate a block of measurements are taken 
of component i on the kth measurement pass. 7^ can be 
interpreted as a vector: 

TTj./s = (tTj, (k-l)pm+li ■■•■> 7r i, pm)- 

With 7 and K ss logn as inputs, sequential thresholding 
operates as follows. First, the procedure initializes, setting 
7Tii = 1. For passes k = l,...,K the procedure measures 
-f-(p m ) jf 7Vi k — 1 t set -K i k+ i, the procedure tests the 
corresponding log-likelihood ratio against the threshold 7: 



That is, if t\ pm ^ is below 7, no further measurements of 
component i are taken. Otherwise, component i is measured 
on the subsequent pass. By definition of 7, approximately p 
times the number of remaining components following f will 
be eliminated on each pass; if s <C n, each thresholding 
step eliminates approximately p times the total number of 
components remaining. 

After pass K, the procedure terminates and estimates S as 
the indices still under consideration: S = {i : ~Ki.K+i = !}■ 

C. Measurement Budget 

Sequential thresholding satisfies the measurement budget 
in (O provided s grows sublinearly with n. For brevity, we 
argue the procedure comes arbitrarily close to satisfying the 
measurement budget for large n: 




if t { f m) > 1 



E 



K-l 



< ((1 — p) k {n — s)pm + sprnj 



fe=0 



< m(n ~ s) + msKp. 



Letting K = logn, the procedure comes arbitrarily close to 
satisfying the constraint as n grows large. To be rigorous in 
showing the procedure satisfies (O, m can be replaced by 
vci — 1, and the analysis throughout holds. 



D. Ability of Sequential Thresholding 

We present the first of the three main theorems of the paper 
to quantify the performance of sequential thresholding. 

Theorem III.l. Ability of sequential thresholding. Provided 

logs log log n 

m> D(f \\f 1 ) + DifoWh) {i) 

sequential thresholding recovers S with high probability. More 
precisely, if 

m 1 



lim 



> 



n^oo log (slog n) £>(/ ||/l) 

then P(£) ->■ 0. 

Proof: From a union bound on the family wise error rate, 
we have 



P(£) < (n - s)a + sf3. 



(4) 



Employing sequential thresholding, from the definition of 7, 

a = (1 — p) K and 

P = p(lj4 pm) <7Nes) 
< kf (t\ pm) < j\i e 5) 

where the inequality follows from a union bound. 

We can further bound the false negative error event using the 
Chernoff-Stein Lemma [5|, p. 384. Consider a simple binary 
hypothesis test with a fixed probability of false positive at 
ao = 1 — p. By the Chernoff-Stein Lemma, the false negative 
probability is then given as 

P (4" m) < 7 \i G S) = e -pmD(M\h) 
where a = e~ mD is equivalent to 

lim — log a = —D. 

m— >oo m 

This implies, for any t\ > 0, for sufficiently large m, 

p (t\ pm) < 7 \ies)< e ^™( D (/oii/i)- e i). 

Letting K = (1 + 62) logn, for sufficiently large n and m, (0| 
becomes 

P(£) < ■fe4 + s(l + e 2 )log(n)e-' ,m ^ /o ll /l )- £l ). 

Hence, P(£) goes to zero provided 

log((l + e 2 )slogn) 

777 ^ 

" pWo||/i)-ci) 

which, as e% and e 2 can be made arbitrarily small, and p can 
be made arbitrarily close to 1, directly gives the theorem: 

log(s log n) 



m > 



D(M\fi) 



IV. Lower Bound on Sequential Procedures 

In this section we derive a lower bound on the rate at which 
m must grow with n for any sequential procedure, and relate 
Sequential Thesholding to the high dimensional extension of 
the well known sequential probability ratio test (SPRT). 

A. Limitation of any sequential procedure 

The lower bound for any sequential procedure is presented 
in the following theorem. 

Theorem IV. 1. Consider any sequential measurement proce- 
dure. Provided 

log s 



m < 



£>(/o||/i) 

the family wise error rate tends to one. More precisely, if 



m 1 
lim < 



(5) 



logs £>(/o||/i) 
then ¥(£) -> 1. 

Proof: First, we show conditions under which the family 
wise error rate goes to one: 



P(£) = P |J Ei U |J Si 



,i£S ies 



= i - p p| e? n p| et 

= 1 - (1 - /3) s (l - «)" 

> 1 - e -/3» e -«("-») 



which goes to one provided either 



1 



a > 



8> 



(6) 



(7) 



Second, for a simple binary hypothesis test, we can bound 
the expected number of measurements of any sequential pro- 
cedure with false positive and false negative probabilities a 
and B, To simplify notation, define: 



E [N] =E 



Ei [AT] = E 



that is, EofA^] and Ei[iV] are the expected number of mea- 
surements under /o and /i respectively. From [6| p.21, we 
have 



Eo[N] > 



1 



D(fo\\h) 



a log 



1-/3 



+ (l-a) log 



1 - a 



which is derived from a simple argument using Jensen's 
inequality. The total expected number of measurements, con- 
strained by the measurement budget, is 



(n - s)E [N) + sE^N] = E 



1,3 



< mn 



(8) 



Dropping the sE\[N] term from ((8), we need to find 
conditions under which the inequality 

1 - a N 



a log ■ 



+ (1 - a) log ■ 



< mn 



D(h\\h) V 1-/3 M ' ° 8 
implies P(£) — ^ 1. Dividing by n log s, the inequality becomes 

n—s ( a , . 1— a\ m 

n , . n r x j " log-; n + 1 - a) log —5— < : • 

D(fo\\fi)n\ogs V 1-/5 P J logs 

Imposing the condition in (0 and cancelling U(/o||/i) from 
both sides, the above inequality requires 



1-a 

+ (1 - a)log— r— ) < 1. 



(9) 



hm — a log 

ri-!-oonlOgS \ 1 — p 

It is sufficient to show that (O implies either a > or 
j3 > - in the high dimensional limit. 

With this in mind, let B = 1 , and a — for some 
ei, £2 S [0, 1). Taking the limit as n — > 00 in (O and reducing 
terms we have: 



lim (■) = 1 



(10) 



which contradicts (O, and negates our assumption that both 
8 = ^ and a = ±=a for £l , e 2 € [0, 1). Hence, by ©, the 
family wise error rate must go to one, completing the proof. 

■ 

B. The SPRT 

The sequential probability ratio test can be extended from 
simple binary hypothesis tests to the high dimensional case 
by simply considering n parallel SPRTs. Each individual 
SPRT operates by continuing to measure a component if the 
corresponding likelihood ratio is within an upper and lower 
boundary, and terminating measurement otherwise. For scalars 
A and B 

f 1 if j - //;,,,., ± 

I else 



Ti,j+1 



where i- is the normalized log-likelihood ratio comprised 
of all prior measurements (unlike sequential thresholding, in 
which the likelihood ratio is only formed using measurements 
from a single pass). If t^p < A/ j, the SPRT labels index i as 
not belonging to S, and if t+ > B j j, index i is assigned to 
S. For a thorough discussion of the SPRT, see [6|. 

Sequential probability ratio tests are optimal for binary 
hypothesis tests in terms of minimum expected number of 
measurements for any error probabilities a and B (shown 
originally in Q), and this optimality can be translated to 
the high dimensional case. Consider a single component i, 
and the corresponding binary hypothesis test. To be thorough, 
we restate the optimal property of the SPRT in the following 
lemma. 

Lemma IV.2. Optimality of the SPRT for simple binary 
tests [8] (p. 63). Consider an SPRT with expected number 
of measurements Eq[-/V] and Ei[A r ], and corresponding error 



probabilities a and f3. Any other sequential test with expected 
number of measurements ErjfiV]' and Ei[iV]' and error prob- 
abilities a' < a and ft 1 < f3 will also have Eo[iV]' > Eo[iV] 
and Ei [AT]' > E X [JV]. 

In short, no procedure with the smaller error probabilities can 
have fewer measurements in expectation than the SPRT. To 
translate the optimality of the SPRT to the high dimensional 
case, we introduce the following lemma. 

Lemma IV.3. Optimality of the SPRT. Consider n component- 
wise sequential probability ratio tests used to estimate S 
each with error probabilities a and (3, and with a total of 
E[J^ i ■ 7Tjj] measurements in expectation. Any other compo- 
nent wise test with a' < a and ft' < ft will also have expected 
number of measurements E[J^ i ■ TTi,j]' > E[^V . n^j]. 

Proof: We can write the total expected number of mea- 
surements as: 



E 



J2 n ^ 



= {n~s)E [N} +sEi[AT] 



which is monotonically increasing in both Eo[iV] and Ei[iV]. 
Together with IIV.2I this implies the lemma. ■ 

C. Comparison of the SPRT to Sequential Thresholding 

Although a fully rigorous proof is quite involved, using 
standard approximations for the sequential probability ratio 
test (again, see (6)) it is relatively straightforward to show the 
SPRT does achieve the lower bound presented above. 

Sequential thresholding is similar in spirit to the SPRT. In 
many scenarios, however, implementing the SPRT can be sub- 
stantially more complicated, if not infeasible, when compared 
to sequential thresholding. To set the stopping boundaries, an 
SPRT requires knowledge of the underlying distributions as 
well as the level of sparsity s. Even when these are available, 
only approximations relating error probabilities to the stopping 
boundaries can be derived in closed-form. 

On the contrary, sequential thresholding does not require 
knowledge of s. Since its sample requirements are within a 
factor a small factor of the lower bound, sequential threshold- 
ing is automatically adaptive to unknown levels of sparsity. 
Moreover, in practice, sequential thresholding needs only 
approximate knowledge of the distributions to operate (such 
that a substantial number of components that follow /q can be 
eliminated on each pass). 

V. Limitation of Non-Sequential Methods 

Our analysis would not be complete without comparison 
of sequential thresholding and the sequential lower bound to 
the performance limits of non-sequential methods. To do so, 
we analyze performance of any non-sequential method using 
Chernoff Information. 

Theorem V.l. Limitation of non-sequential testing. Consider 
any non-sequential thresholding procedure. Provided 

logn 



the family wise error rate goes to 1. To be precise, ([77} is 
equivalent to 

m 1 
lim < . 

ri^oclogn D(/i||/o) 

which implies lirrin^oo P(£) = 1. 

Proof: From [5|, p. 386, (Chernoff Information) and by 
(0 any non-sequential test fails provided 

Q ^ e -m£>(/*||/o) > 1 



P ^ e -mU(/x||/i) > I 



where 



A j-l-A 



fx 



fo fl 



fnfo X fr X dy 



for A 6 [0, 1]. Hence, any sequential procedure fails provided 

log(n — s) logs 



m < 



mm max 

Ae[o,i] 



which is implied if 



m < 



£>(/a||/o)'£>(/a||/i) 



log n 



m < 



D(hWfo) 



(11) 



D(h\\fo) 

completing the proof. ■ 

VI. Conclusion 

This paper showed sequential methods for support recovery 
of high dimensional sparse signals in noise can succeed using 
far fewer measurements than non-sequential methods. Specif- 
ically, non-sequential methods require the number of mea- 
surements to grow logarithmically with the dimension, while 
sequential methods succeed if the number of measurements 
grows logarithmically with the level of sparsity. Additionally, a 
simple procedure termed sequential thresholding comes within 
a small additive factor of the lower bound in terms of number 
of measurements per dimension. 
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